The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Air pollution is a crucial issue affecting human health and livelihoods, as well as one of the barriers to economic and social growth. Forecasting air quality has become an increasingly important endeavor with significant social impacts, especially in emerging countries like China. In this paper, we present a novel Transformer architecture termed AirFormer to collectively predict nationwide air quality in China, with an unprecedented fine spatial granularity covering thousands of locations. AirFormer decouples the learning process into two stages -- 1) a bottom-up deterministic stage that contains two new types of self-attention mechanisms to efficiently learn spatio-temporal representations; 2) a top-down stochastic stage with latent variables to capture the intrinsic uncertainty of air quality data. We evaluate AirFormer with 4-year data from 1,085 stations in the Chinese Mainland. Compared to the state-of-the-art model, AirFormer reduces prediction errors by 5%~8% on 72-hour future predictions. Our source code is available at https://github.com/yoshall/airformer.
translated by 谷歌翻译
VQA是一项雄心勃勃的任务,旨在回答任何与图像有关的问题。但是,实际上,由于用户的需求不断更新,并且该系统必须实施新功能,因此很难为所有人构建这样的系统。因此,持续学习(CL)能力是开发高级VQA系统的必要条件。最近,先锋工作将一个VQA数据集分为不相交的答案集以研究此主题。但是,VQA上的CL不仅涉及标签集的扩展(新答案集)。在将VQA系统部署到新环境(新的视觉场景)以及如何回答需要新功能的问题(新问题类型)时,研究如何回答问题至关重要。因此,我们提出了Clove,这是一个在视觉问题答案上连续学习的基准,其中包含上述两个CL方案的场景和功能收入设置。在方法论方面,VQA和分类的CL之间的主要区别在于,前者还涉及扩大和防止忘记推理机制,而后者则集中在班级表示上。因此,我们提出了一种为CL上量身定制的基于无数据的基于Real-DATA的基于VQA上的方法,称为场景图作为符号重播的提示。它使用一段场景图作为提示,它可以重播伪场景图,以表示过去的图像以及相关的QA对。还提出了一个统一的VQA模型来利用当前和重播数据来增强其质量检查能力。最后,实验结果揭示了丁香的挑战,并证明了我们方法的有效性。数据集和代码将在https://github.com/showlab/clvqa上找到。
translated by 谷歌翻译
在为临床应用设计诊断模型时,至关重要的是要确保模型在各种图像损坏方面的稳健性。在此,建立了易于使用的基准,以评估神经网络在损坏的病理图像上的性能。具体而言,通过将九种类型的常见损坏注入验证图像来生成损坏的图像。此外,两个分类和一个排名指标旨在评估腐败下的预测和信心表现。在两个结果的基准数据集上进行了评估,我们发现(1)各种深神经网络模型的准确性降低(两倍是清洁图像上的误差的两倍)和对损坏图像的不可靠置信度估计; (2)验证和测试错误之间的相关性较低,同时用我们的基准替换验证集可以增加相关性。我们的代码可在https://github.com/superjamessyx/robustness_benchmark上找到。
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译
认知科学表明,人类会以所见主体的变化分离的事件来感知视频。状态变化触发新事件,是大量冗余信息中最有用的事件之一。但是,先前的研究重点是对细分市场的总体理解,而无需评估内部的细粒度变化。在本文中,我们介绍了一个名为Kinetic-GEB+的新数据集。该数据集由与标题相关的170K边界组成,这些字幕描述了12K视频中通用事件中的状态更改。在这个新数据集中,我们提出了三个任务,支持通过状态变化开发对视频的更细粒度,健壮和类似人类的理解。我们在数据集中评估了许多代表性基线,在该基础上,我们还设计了一种新的TPD(基于时间的成对差异)建模方法,以进行视觉差异并实现显着的性能改进。此外,结果表明,在利用不同粒度,视觉差异的表示以及状态变化的准确定位方面,当前方法仍然存在着巨大的挑战。进一步的分析表明,我们的数据集可以推动开发更强大的方法来了解状态变化,从而提高视频级别的理解。该数据集可从https://github.com/yuxuan-w/geb-plus获得
translated by 谷歌翻译
它仍然是一个管道梦想,电话和AR眼镜的AI助手可以帮助我们的日常生活来解决我们的问题,如“如何调整这款手表日期?”和“如何设置加热持续时间?(指向烤箱的同时)”。传统任务中使用的查询(即视频问题应答,视频检索,时刻定位)通常是有关的,并基于纯文本。相比之下,我们提出了一项名为Cometdancy的问题驱动视频段检索(AQVSR)的新任务。我们每个问题都是一个图像框文本查询,专注于我们日常生活中的物品,并期望从教学视频转录程序段的语料库中检索相关的答案段。为了支持对此AQVSR任务的研究,我们构建一个名为AssionSR的新数据集。我们设计新颖的准则来创造高质量样本。此数据集包含有关1K视频片段的1.4K多模态问题,来自各种日用物品的教学视频。为了解决AQVSR,我们开发了一个称为双重多模式编码器(DME)的简单但有效的模型,显着优于几种基线方法,同时仍然有大型未来改善空间。此外,我们提供了详细的消融分析。我们的代码和数据可以在https://github.com/stanlei52/aqvsr中获得。
translated by 谷歌翻译
重量修剪是一种有效的模型压缩技术,可以解决在移动设备上实现实时深神经网络(DNN)推断的挑战。然而,由于精度劣化,难以利用硬件加速度,以及某些类型的DNN层的限制,难以降低的应用方案具有有限的应用方案。在本文中,我们提出了一般的细粒度的结构化修剪方案和相应的编译器优化,适用于任何类型的DNN层,同时实现高精度和硬件推理性能。随着使用我们的编译器优化所支持的不同层的灵活性,我们进一步探讨了确定最佳修剪方案的新问题,了解各种修剪方案的不同加速度和精度性能。两个修剪方案映射方法,一个是基于搜索,另一个是基于规则的,建议自动推导出任何给定DNN的每层的最佳修剪规则和块大小。实验结果表明,我们的修剪方案映射方法,以及一般细粒化结构修剪方案,优于最先进的DNN优化框架,最高可达2.48 $ \ times $和1.73 $ \ times $ DNN推理加速在CiFar-10和Imagenet DataSet上没有准确性损失。
translated by 谷歌翻译
软致动器在符合性和形态方面表现出具有很大的优势,用于操纵细腻物体和在密闭空间中的检查。对于可以提供扭转运动的软致动器有一个未满足的需要。放大工作空间并增加自由度。为此目标,我们呈现由硅胶制成的折纸启发的软充气执行器(OSPas)。原型可以输出多于一个旋转的旋转(高达435 {\ DEG}),比以前的同行更大。我们描述了设计和制作方法,构建了运动学模型和仿真模型,并分析和优化参数。最后,我们通过整合到能够同时抓住和提升脆弱或扁平物体的夹具,这是一种能够与扭转致动器的直角拾取和放置物品的多功能机器人,以及柔软的蛇通过扭转致动器的扭转能够改变姿态和方向的机器人。
translated by 谷歌翻译
在基因组生物学研究中,调节基因组建模是许多监管下游任务的重要课题,例如推动者分类,交易因子结合位点预测。核心问题是模拟监管元素如何相互交互及其跨不同小区类型的可变性。然而,目前的深度学习方法通​​常专注于建模固定的细胞类型集的基因组序列,并且不考虑多个调节元件之间的相互作用,使它们仅在训练集中的小区类型上表现良好,并且缺乏所需的概括生物学应用。在这项工作中,我们提出了一种简单但有效的方法,用于以多模态和自我监督的方式预先培训基因组数据,我们称之为Genebert。具体而言,我们同时服用1D基因组数据和2D矩阵(转录因子X区)作为输入,其中提出了三项预训练任务,以提高模型的鲁棒性和概括性。我们在ATAC-SEQ数据集上预先培训我们的模型,具有1700万基因组序列。我们在不同细胞类型中评估我们的Genebert关于监管下游任务,包括启动子分类,交易因子结合位点预测,疾病风险估计和剪接部位预测。广泛的实验证明了大型监管基因组学数据的多模态和自我监督的预培训的有效性。
translated by 谷歌翻译